<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Highscore - The Boost C++ Libraries - Parser</title>
<link rel="stylesheet" href="css/highscore.css" type="text/css">
<link rev="made" href="mailto:boris@highscore.de">
<link rel="home" href="frontpage.html" title="The Boost C++ Libraries">
<link rel="up" href="frontpage.html" title="The Boost C++ Libraries">
<link rel="prev" href="serialization.html" title="Chapter 11: Serialization">
<link rel="next" href="containers.html" title="Chapter 13: Containers">
<link rel="chapter" href="introduction.html" title="Chapter 1: Introduction">
<link rel="chapter" href="smartpointers.html" title="Chapter 2: Smart Pointers">
<link rel="chapter" href="functionobjects.html" title="Chapter 3: Function Objects">
<link rel="chapter" href="eventhandling.html" title="Chapter 4: Event Handling">
<link rel="chapter" href="stringhandling.html" title="Chapter 5: String Handling">
<link rel="chapter" href="multithreading.html" title="Chapter 6: Multithreading">
<link rel="chapter" href="asio.html" title="Chapter 7: Asynchronous Input and Output">
<link rel="chapter" href="interprocesscommunication.html" title="Chapter 8: Interprocess Communication">
<link rel="chapter" href="filesystem.html" title="Chapter 9: Filesystem">
<link rel="chapter" href="datetime.html" title="Chapter 10: Date and Time">
<link rel="chapter" href="serialization.html" title="Chapter 11: Serialization">
<link rel="chapter" href="parser.html" title="Chapter 12: Parser">
<link rel="chapter" href="containers.html" title="Chapter 13: Containers">
<link rel="chapter" href="datastructures.html" title="Chapter 14: Data Structures">
<link rel="chapter" href="errorhandling.html" title="Chapter 15: Error Handling">
<link rel="chapter" href="castoperators.html" title="Chapter 16: Cast Operators">
<link rel="section" href="parser.html#parser_general" title="12.1 General">
<link rel="section" href="parser.html#parser_ebnf" title="12.2 Extended Backus-Naur Form">
<link rel="section" href="parser.html#parser_grammar" title="12.3 Grammar">
<link rel="section" href="parser.html#parser_actions" title="12.4 Actions">
<link rel="section" href="parser.html#parser_exercises" title="12.5 Exercises">
<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.highscore.de" r (nz 1 vz 1 lz 1 oz 1 cz 1) gen true for "http://highscore.de" r (nz 1 vz 1 lz 1 oz 1 cz 1))'>
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<link href="http://www.highscore.de/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon">
<script type="text/javascript" src="js/jquery-1.3.2.min.js"></script><script type="text/javascript" src="js/jquery.event.drag-1.5.min.js"></script><script type="text/javascript" src="js/highscore.js"></script>
</head>
<body>
<div lang="en" class="docbook chapter" title="Chapter 12: Parser">
<p class="title">The Boost C++ Libraries</p>
<script type="text/javascript">
          var titlepage = "前言";
        
      var titles = new Array(titlepage,
      
        "第1章: 简介",
      
        "第2章: 智能指针",
      
        "第3章: 函数对象",
      
        "第4章: 事件处理",
      
        "第5章: 字符串处理",
      
        "第6章: 多线程",
      
        "第7章: 异步输入输出",
      
        "第8澡: 进程间通信",
      
        "第9章: 文件系统",
      
        "第10章: 日期与时间",
      
        "第11章: 序列化",
      
        "第12章: 词法分析器",
      
        "第13章: 容器",
      
        "第14章: 数据结构",
      
        "第15章: 错误处理",
      
        "第16章: 转型操作符",
      
      "");

      
          var titlehtml = "frontpage.html";
        
      var filenames = new Array(titlehtml,
      
        "introduction.html",
      
        "smartpointers.html",
      
        "functionobjects.html",
      
        "eventhandling.html",
      
        "stringhandling.html",
      
        "multithreading.html",
      
        "asio.html",
      
        "interprocesscommunication.html",
      
        "filesystem.html",
      
        "datetime.html",
      
        "serialization.html",
      
        "parser.html",
      
        "containers.html",
      
        "datastructures.html",
      
        "errorhandling.html",
      
        "castoperators.html",
      
      "");

      
      document.open();
      document.write('<form action="" class="toc">');
      document.write('<select size="1" onchange="location.href=options[selectedIndex].value">');
      for (var i = 0; i < titles.length && i < filenames.length; ++i) {
        if (titles[i] != "" && filenames[i] != "") {
          document.write('<option');
          document.write(' value="' + filenames[i] + '"');
          var expr = new RegExp('[/\]' + filenames[i] + '$');
          if (expr.test(location.href)) {
            document.write(' selected="selected"');
          }
          document.write('>' + titles[i] + '<\/option>');
        }
      }
      document.write('<\/select>');
      document.write('<\/form>');
      document.close();
      
    </script><noscript><p class="toc"><a href="toc.html">目录</a></p></noscript><hr class="hrhead">
<h1 class="title">
<a name="parser"></a><small>Chapter 12:</small> Parser</h1>
<hr>
<div class="toc">
<h3>Table of Contents</h3>
<ul>
<li><span class="sect1"><a href="parser.html#parser_general">12.1 General</a></span></li>
<li><span class="sect1"><a href="parser.html#parser_ebnf">12.2 Extended Backus-Naur Form</a></span></li>
<li><span class="sect1"><a href="parser.html#parser_grammar">12.3 Grammar</a></span></li>
<li><span class="sect1"><a href="parser.html#parser_actions">12.4 Actions</a></span></li>
<li><span class="sect1"><a href="parser.html#parser_exercises">12.5 Exercises</a></span></li>
</ul>
</div>
<p class="license"><a href="http://creativecommons.org/licenses/by-nc-nd/3.0/" rel="license" target="_top"><img src="img/88x31_cc_logo.gif" alt="" width="88" height="31"></a> This book is licensed under a <a href="http://creativecommons.org/licenses/by-nc-nd/3.0/" rel="license" target="_top">Creative Commons License</a>.</p>
<hr>
<h2 class="title">
<a name="parser_general"></a>12.1 General</h2>
<div class="sect1">
<p>Parsers are used to read formats that allow a flexible but potential complicated structure of data. A good example of such a format is C++ code. The parser of the compiler needs to understand the various language constructs in all possible combinations of C++ in order to translate them to a binary form.</p>
<p>The major issue while developing parsers is usually the sheer amount of rules by which the corresponding data is structured. For example, C++ supports so many language constructs that the development of a corresponding parser would require countless <code class="code">if</code> expressions to recognize any imaginable C++ code as being valid.</p>
<p>The library <a class="link" href="http://www.boost.org/libs/spirit/">Boost.Spirit</a> introduced in this chapter turns the tables on developing parsers. Instead translating explicit rules into code and validating this code using countless <code class="code">if</code> expressions, Boost.Spirit allows to express the rules using the so-called Extended Backus-Naur Form. Using these rules, Boost.Spirit then can parse a C++ source code file accordingly.</p>
<p>The basic idea of Boost.Spirit is similar to regular expressions. Instead of searching a text for a specific pattern using <code class="code">if</code> expressions, the pattern is rather specified as a regular expression. The search is then performed by a library such as Boost.Regex so that the developer does not need to care about the details.</p>
<p>This chapter shows how to use Boost.Spirit to read complicated formats for which regular expressions are no longer feasible. Since Boost.Spirit is a comprehensive library introducing different concepts, a simple parser for the <a class="link" href="http://www.json.org/">JSON format</a> is developed throughout the chapter. JSON is an existing format which is used by e.g. Ajax applications to exchange data similar to XML between applications that potentially may even run on different platforms.</p>
<p>Even though Boost.Spirit simplifies the development of parsers, no one succeeded to write a C++ parser based on this library yet. The development of such a parser is still a long-term goal of Boost.Spirit, however, due to the complexity of the C++ language, has not been achieved so far. Boost.Spirit is currently not well suited for these kind of complex or binary formats.</p>
</div>
<hr>
<h2 class="title">
<a name="parser_ebnf"></a>12.2 Extended Backus-Naur Form</h2>
<div class="sect1">
<p>The Backus-Naur Form, abbreviated BNF, is a language to precisely describe rules and is used in many technical specifications. For example, many specifications of the numerous Internet protocols, the so-called request for comments, contain the rules in BNF in addition to annotations in text.</p>
<p>Boost.Spirit supports the Extended Backus-Naur Form (EBNF) which allows to specify rules in a shorter form than BNF does. The main advantage of EBNF is a shortened and thus simplified notation.</p>
<p>Please note that there are different variations of EBNF which may differ in their syntax. This chapter as well as Boost.Spirit itself uses the EBNF which syntax resembles the one of regular expressions.</p>
<p>In order to use Boost.Spirit, one needs to understand EBNF accordingly. Most of the time, developers already know EBNF and therefore select Boost.Spirit to reuse rules previously specified in EBNF. Below is a short introduction to EBNF; for a quick reference of the syntax used in this chapter and by Boost.Spirit, please refer to the W3C XML specification that contains a <a class="link" href="http://www.w3.org/TR/2004/REC-xml-20040204/#sec-notation">short summary</a>.</p>
<table border="0" width="100%" cellpadding="0" cellspacing="0" class="productionset"><tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">digit</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"</td>
</tr></table>
<p>Strictly speaking, EBNF denotes rules as production rules. Any number of production rules can be combined to describe a particular format. The above format consists of only one production rule. It defines a <code class="nonterminal">digit</code> that is made up of a number between 0 and 9.</p>
<p>Definitions such as <code class="nonterminal">digit</code> are called nonterminal symbols. The numeric values 0 to 9 in the above definition are called terminal symbols instead. These symbols do not have any special meaning and can be easily identified since they are enclosed by double quotes.</p>
<p>All numeric values are connected by the pipe operator which has the same meaning as the <code class="code">||</code> operator in C++: It denotes an alternative.</p>
<p>Summarized, the production rule specifies that any number between 0 and 9 is a <code class="nonterminal">digit</code>.</p>
<table border="0" width="100%" cellpadding="0" cellspacing="0" class="productionset"><tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">integer</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">("+" | "-")? <code class="nonterminal">digit</code>+</td>
</tr></table>
<p>The new nonterminal symbol <code class="nonterminal">integer</code> consists of at least a <code class="nonterminal">digit</code> that can be preceded by a plus or minus sign.</p>
<p>The definition of <code class="nonterminal">integer</code> uses a couple of new operators. Parenthesis are used to create partial expressions just like in mathematics. Other operators can then be applied to these expressions. The question mark denotes that the partial expression can only be declared either once or not at all.</p>
<p>The plus sign following <code class="nonterminal">digit</code> indicates that the corresponding expression must be declared at least once.</p>
<p>This new production rule defines an arbitrary positive or negative integer. While a <code class="nonterminal">digit</code> consists of exactly one digit, an <code class="nonterminal">integer</code> can consist of any combination of digits which can also be marked unsigned or signed. Whereas 5 is both a <code class="nonterminal">digit</code> and an <code class="nonterminal">integer</code>, +5 is solely an <code class="nonterminal">integer</code>. The same applies to 169 or -8 which are also <code class="nonterminal">integer</code> only.</p>
<p>More and more complex production rules can be created by simply defining and combining nonterminal symbols.</p>
<table border="0" width="100%" cellpadding="0" cellspacing="0" class="productionset"><tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">real</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">
<code class="nonterminal">integer</code> "." <code class="nonterminal">digit</code>*</td>
</tr></table>
<p>While the definition of <code class="nonterminal">integer</code> represents integers, the definition of <code class="nonterminal">real</code> represents floating-point numbers. The rule is based on the already defined nonterminal symbols <code class="nonterminal">integer</code> and <code class="nonterminal">digit</code> separated by a dot. The asterisk after <code class="nonterminal">digit</code> specifies that digits after the dot are optional: There can either be an arbitrary number or none.</p>
<p>Floating-point numbers such as 1.2, -16.99 or even 3. satisfy the definition of <code class="nonterminal">real</code>. However, the current definition does not allow floating-point numbers without a leading zero such as .9.</p>
<p>As mentioned in the beginning of this chapter, a parser for the JSON format is going to be developed using Boost.Spirit. For this purpose, the rules of the JSON format need to be expressed in EBNF.</p>
<table border="0" width="100%" cellpadding="0" cellspacing="0" class="productionset">
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">object</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">"{" <code class="nonterminal">member</code> ("," <code class="nonterminal">member</code>)* "}"</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">member</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">
<code class="nonterminal">string</code> ":" <code class="nonterminal">value</code>
</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">string</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">'"' <code class="nonterminal">character</code>* '"'</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">value</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">
<code class="nonterminal">string</code> | <code class="nonterminal">number</code> | <code class="nonterminal">object</code> | <code class="nonterminal">array</code> | "true" | "false" | "null"</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">number</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">
<code class="nonterminal">integer</code> | <code class="nonterminal">real</code>
</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">array</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">"[" <code class="nonterminal">value</code> ("," <code class="nonterminal">value</code>)* "]"</td>
</tr>
<tr>
<td align="right" valign="top" width="10%"><code class="nonterminal">character</code></td>
<td valign="top" width="5%" align="center">=</td>
<td valign="top" width="85%">"a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"</td>
</tr>
</table>
<p>The JSON format is based on objects that contain pairs of keys and values enclosed by curly braces. While the keys are simple strings, values can range from strings, numeric values, arrays, other objects or the literal <code class="literal">true</code>, <code class="literal">false</code> or <code class="literal">null</code>. Strings are a consecutive character stream enclosed by double quotes. Numeric values can either be integers or floating-point numbers. Arrays contain values separated by comma that are enclosed by squared brackets.</p>
<p>Please note that the above definition is not complete. On the one hand, the definition of <code class="nonterminal">character</code> lacks capital letters as well as additional characters; on the other hand, JSON supports specifics such as Unicode or control characters. This can be currently ignored since Boost.Spirit defines frequently used nonterminal symbols such as alphanumeric characters to save one from typing endless character streams. In addition, a string will be later defined in code as a consecutive stream of arbitrary characters excluding double quotes. Since double quotes finalize the string, all other characters can be therefore used within a string. The above EBNF does not express this accordingly since EBNF requires the definition of a nonterminal symbol for all characters except the ones that should be excluded in order to define an exception.</p>
<p>The following is an example for a JSON format for which above definition applies.</p>
<pre class="literallayout">{
  "Boris Schäling" :
  {
    "Male": true,
    "Programming Languages": [ "C++", "Java", "C#" ],
    "Age": 31
  }
}</pre>
<p>The global object is characterized by the outer curly braces and contains a key-value pair. The key is named "Boris Schäling", the value is a new object that contains multiple key-value pairs. While all the keys are strings, the values are the literal <code class="literal">true</code>, an array containing several strings and a numeric value.</p>
<p>The EBNF rules defined above can now be used to develop a parser using Boost.Spirit that is able to read the above JSON format.</p>
</div>
<hr>
<h2 class="title">
<a name="parser_grammar"></a>12.3 Grammar</h2>
<div class="sect1">
<p>After defining the rules for the JSON format in EBNF in the previous section, they now need to be used together with Boost.Spirit somehow. Boost.Spirit actually allows to define the EBNF rules as C++ code by overloading the different operators used by EBNF.</p>
<p>Please note that the EBNF rules need to be modified slightly in order to create valid C++ code. Symbols that are combined by a space in EBNF need to be combined using some operator in C++. Additionally, operators such as the asterisk, the question mark and the plus sign, placed directly behind the corresponding symbol in EBNF, must be placed in front of the symbol in order to use them as unary operators in C++.</p>
<p>The following shows the EBNF rules for the JSON format expressed in C++ code for Boost.Spirit.</p>
<pre class="programlisting">#include &lt;boost/spirit.hpp&gt; 

struct json_grammar 
  : public boost::spirit::grammar&lt;json_grammar&gt; 
{ 
  template &lt;typename Scanner&gt; 
  struct definition 
  { 
    boost::spirit::rule&lt;Scanner&gt; object, member, string, value, number, array; 

    definition(const json_grammar &amp;self) 
    { 
      using namespace boost::spirit; 
      object = "{" &gt;&gt; member &gt;&gt; *("," &gt;&gt; member) &gt;&gt; "}"; 
      member = string &gt;&gt; ":" &gt;&gt; value; 
      string = "\"" &gt;&gt; *~ch_p("\"") &gt;&gt; "\""; 
      value = string | number | object | array | "true" | "false" | "null"; 
      number = real_p; 
      array = "[" &gt;&gt; value &gt;&gt; *("," &gt;&gt; value) &gt;&gt; "]"; 
    } 

    const boost::spirit::rule&lt;Scanner&gt; &amp;start() 
    { 
      return object; 
    } 
  }; 
}; 

int main() 
{ 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/12.3.1/main.cpp">Download source code</a></li></ul>
<p>To use the different classes of Boost.Spirit, the header <code class="filename">boost/spirit.hpp</code> needs to be included. The classes are provided within the namespace <code class="code">boost::spirit</code>.</p>
<p>In order to create a parser with Boost.Spirit, a so-called grammar must be created which among other things defines the rules after which the data is structured. In the above example the <code class="classname">json_grammar</code> class has been developed which is derived from the template class <code class="classname">boost::spirit::grammar</code> and is instantiated  with the name of the class. <code class="classname">json_grammar</code> defines the complete grammar necessary to understand the JSON format.</p>
<p>One important  component of the grammar are the rules to read structured data correctly. These rules are defined within an inner class named <code class="classname">definition</code> - this name is mandatory. This class is a template with one argument which is instantiated with a so-called scanner by Boost.Spirit. A scanner is a concept internally used by Boost.Spirit. Even though it is mandatory for <code class="classname">definition</code> to be a template taking a scanner type as its argument, it is irrelevant for the daily use of Boost.Spirit what these scanners actually are and why they are defined.</p>
<p><code class="classname">definition</code> must define a method named <code class="methodname">start()</code> which is called by Boost.Spirit to obtain the complete rules and standards of the grammar. The return value of this method is a constant reference to <code class="classname">boost::spirit::rule</code> that is also a template class instantiated with the scanner type.</p>
<p>The class <code class="classname">boost::spirit::rule</code> is used to define rules. Nonterminal symbols are defined by means of this class. The previously defined nonterminal symbols <code class="nonterminal">object</code>, <code class="nonterminal">member</code>, <code class="nonterminal">string</code>, <code class="nonterminal">value</code>, <code class="nonterminal">number</code> and <code class="nonterminal">array</code> are all of type <code class="classname">boost::spirit::rule</code>.</p>
<p>All of these objects are defined as properties of the <code class="classname">definition</code> class which is not mandatory but alleviates the definition especially if rules reference each other recursively. As seen with the EBNF examples in the previous section,  recursive references are not an issue.</p>
<p>At first glance, the definitions of the rules within the constructor of <code class="classname">definition</code> look similar to the production rules of the EBNF seen in the previous section. This does not come as a surprise though since this is exactly the goal of Boost.Spirit: To reuse production rules defined in the EBNF.</p>
<p>While the C++ code resembles the rules established in the EBNF, there are certainly small differences in order to write valid C++.  For example, all of the symbols are concatenated via the <code class="code">&gt;&gt;</code> operator. EBNF operators such as the asterisk are placed in front of the symbols instead of following them.  Despite these syntax changes, Boost.Spirit strives to convert EBNF rules to C++ code without many changes.</p>
<p>The constructor of <code class="classname">definition</code> uses two classes provided by Boost.Spirit: <code class="classname">boost::spirit::ch_p</code> and <code class="classname">boost::spirit::real_p</code>.  Frequently used rules are provided in the form of parsers that can easily be reused. One example is <code class="classname">boost::spirit::real_p</code> which allows storing positive and negative integers and floating-point  numbers without the need to define nonterminal symbols such as <code class="nonterminal">digit</code> or <code class="nonterminal">real</code>.</p>
<p><code class="classname">boost::spirit::ch_p</code> can be used to create a parser for a single character which is equal to enclosing the character with double quotes. In the above example, the usage of <code class="classname">boost::spirit::ch_p</code> is mandatory since the tilde and asterisk are applied to the double quote sign. Without the class, the code would read <code class="code">*~"\""</code> which would be rejected by the compiler as invalid code.</p>
<p>The tilde actually allows the trick mentioned in the previous section: By prefixing the double quote with the tilde, all characters except the double quote are accepted.</p>
<p>After the rules for identifying the JSON format are defined, the following example shows how to use them.</p>
<pre class="programlisting">#include &lt;boost/spirit.hpp&gt; 
#include &lt;fstream&gt; 
#include &lt;sstream&gt; 
#include &lt;iostream&gt; 

struct json_grammar 
  : public boost::spirit::grammar&lt;json_grammar&gt; 
{ 
  template &lt;typename Scanner&gt; 
  struct definition 
  { 
    boost::spirit::rule&lt;Scanner&gt; object, member, string, value, number, array; 

    definition(const json_grammar &amp;self) 
    { 
      using namespace boost::spirit; 
      object = "{" &gt;&gt; member &gt;&gt; *("," &gt;&gt; member) &gt;&gt; "}"; 
      member = string &gt;&gt; ":" &gt;&gt; value; 
      string = "\"" &gt;&gt; *~ch_p("\"") &gt;&gt; "\""; 
      value = string | number | object | array | "true" | "false" | "null"; 
      number = real_p; 
      array = "[" &gt;&gt; value &gt;&gt; *("," &gt;&gt; value) &gt;&gt; "]"; 
    } 

    const boost::spirit::rule&lt;Scanner&gt; &amp;start() 
    { 
      return object; 
    } 
  }; 
}; 

int main(int argc, char *argv[]) 
{ 
  std::ifstream fs(argv[1]); 
  std::ostringstream ss; 
  ss &lt;&lt; fs.rdbuf(); 
  std::string data = ss.str(); 

  json_grammar g; 
  boost::spirit::parse_info&lt;&gt; pi = boost::spirit::parse(data.c_str(), g, boost::spirit::space_p); 
  if (pi.hit) 
  { 
    if (pi.full) 
      std::cout &lt;&lt; "parsing all data successfully" &lt;&lt; std::endl; 
    else 
      std::cout &lt;&lt; "parsing data partially" &lt;&lt; std::endl; 
    std::cout &lt;&lt; pi.length &lt;&lt; " characters parsed" &lt;&lt; std::endl; 
  } 
  else 
    std::cout &lt;&lt; "parsing failed; stopped at '" &lt;&lt; pi.stop &lt;&lt; "'" &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/12.3.2/main.cpp">Download source code</a></li></ul>
<p>Boost.Spirit offers a free-standing function named <code class="function">boost::spirit::parse()</code>. By creating an instance of a grammar, a parser is created accordingly which is passed to <code class="function">boost::spirit::parse()</code> as the second argument.  The first argument denotes the text to be parsed while the third argument is a parser indicating which characters should be skipped in the given text. To skip spaces, an object of type <code class="classname">boost::spirit::space_p</code> is passed as the third argument. This simply means that an arbitrary number of spaces is allowed between the data to be captured - in other words, everywhere the <code class="code">&gt;&gt;</code> operator was applied in the rules. Tabulators and line breaks are included which allows for a more flexible notation of data formats.</p>
<p><code class="function">boost::spirit::parse()</code> returns an object of type <code class="classname">boost::spirit::parse_info</code> that offers four properties indicating whether or not the text was successfully parsed. The <var>hit</var> property is set to <code class="literal">true</code> if the text was successfully parsed. If all characters in the text were parsed without e.g. remaining spaces at the end, <var>full</var> is also set to <code class="literal">true</code>. Only if <var>hit</var> equals <code class="literal">true</code>, <var>length</var> is valid and contains the number of characters parsed successfully.</p>
<p>The <var>length</var> property must not be accessed if the text could not be parsed successfully. In this case, the <var>stop</var> property allows to access the text location where parsing stopped. While <var>stop</var> can also be accessed if the text was parsed successfully, it does not make much sense since it will point behind the parsed text in this case.</p>
</div>
<hr>
<h2 class="title">
<a name="parser_actions"></a>12.4 Actions</h2>
<div class="sect1">
<p>So far, one knows how to define a grammar in order to obtain a new parser that can be used to identify whether a specific text is structured according to the rules of the grammar. However, the data format still is not interpreted at this point since the individual data read from a structured format such as JSON are not processed further.</p>
<p>In order to process data satisfying a specific rule as identified by the parser, actions are used.  Actions are functions that are associated with rules. If the parser identifies data satisfying a specific rule, the associated action is executed, passing the data to be processed accordingly as shown in the following example. </p>
<pre class="programlisting">#include &lt;boost/spirit.hpp&gt; 
#include &lt;string&gt; 
#include &lt;fstream&gt; 
#include &lt;sstream&gt; 
#include &lt;iostream&gt; 

struct json_grammar 
  : public boost::spirit::grammar&lt;json_grammar&gt; 
{ 
  struct print 
  { 
    void operator()(const char *begin, const char *end) const 
    { 
      std::cout &lt;&lt; std::string(begin, end) &lt;&lt; std::endl; 
    } 
  }; 

  template &lt;typename Scanner&gt; 
  struct definition 
  { 
    boost::spirit::rule&lt;Scanner&gt; object, member, string, value, number, array; 

    definition(const json_grammar &amp;self) 
    { 
      using namespace boost::spirit; 
      object = "{" &gt;&gt; member &gt;&gt; *("," &gt;&gt; member) &gt;&gt; "}"; 
      member = string[print()] &gt;&gt; ":" &gt;&gt; value; 
      string = "\"" &gt;&gt; *~ch_p("\"") &gt;&gt; "\""; 
      value = string | number | object | array | "true" | "false" | "null"; 
      number = real_p; 
      array = "[" &gt;&gt; value &gt;&gt; *("," &gt;&gt; value) &gt;&gt; "]"; 
    } 

    const boost::spirit::rule&lt;Scanner&gt; &amp;start() 
    { 
      return object; 
    } 
  }; 
}; 

int main(int argc, char *argv[]) 
{ 
  std::ifstream fs(argv[1]); 
  std::ostringstream ss; 
  ss &lt;&lt; fs.rdbuf(); 
  std::string data = ss.str(); 

  json_grammar g; 
  boost::spirit::parse_info&lt;&gt; pi = boost::spirit::parse(data.c_str(), g, boost::spirit::space_p); 
  if (pi.hit) 
  { 
    if (pi.full) 
      std::cout &lt;&lt; "parsing all data successfully" &lt;&lt; std::endl; 
    else 
      std::cout &lt;&lt; "parsing data partially" &lt;&lt; std::endl; 
    std::cout &lt;&lt; pi.length &lt;&lt; " characters parsed" &lt;&lt; std::endl; 
  } 
  else 
    std::cout &lt;&lt; "parsing failed; stopped at '" &lt;&lt; pi.stop &lt;&lt; "'" &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/12.4.1/main.cpp">Download source code</a></li></ul>
<p>Actions are implemented as functions or function objects. The latter are beneficial if the action should be initialized or maintain state information between repeated executions. The above example implements the action as a function object.</p>
<p>The class <code class="classname">print</code> is a function object which writes data to the standard output stream. When called, the overloaded <code class="methodname">operator()()</code> operator receives a pointer to the beginning and end of the data as identified by the rules executing the particular action.</p>
<p>The example associates the action with the nonterminal symbol <code class="nonterminal">string</code> which appears as the first symbol after <code class="nonterminal">member</code>. An instance of type <code class="classname">print</code> is passed to the nonterminal symbol <code class="nonterminal">string</code> inside squared brackets. As <code class="nonterminal">string</code> represents the key within key-value pairs of JSON objects, the overloaded <code class="methodname">operator()()</code> operator of class <code class="classname">print</code> is called for every discovered key which simply writes the key to the standard output stream.</p>
<p>It is now possible to define an arbitrary number of actions or to associate them with an arbitrary number of symbols. In order to associate an action with a literal, a parser must be specified explicitly. This is no different from the definition of the nonterminal symbol <code class="nonterminal">string</code> specifying the <code class="classname">boost::spirit::ch_p</code> class. The following example uses the <code class="classname">boost::spirit::str_p</code> class to associate an object of type <code class="classname">print</code> with the literal <code class="literal">true</code>.</p>
<pre class="programlisting">#include &lt;boost/spirit.hpp&gt; 
#include &lt;string&gt; 
#include &lt;fstream&gt; 
#include &lt;sstream&gt; 
#include &lt;iostream&gt; 

struct json_grammar 
  : public boost::spirit::grammar&lt;json_grammar&gt; 
{ 
  struct print 
  { 
    void operator()(const char *begin, const char *end) const 
    { 
      std::cout &lt;&lt; std::string(begin, end) &lt;&lt; std::endl; 
    } 

    void operator()(const double d) const 
    { 
      std::cout &lt;&lt; d &lt;&lt; std::endl; 
    } 
  }; 

  template &lt;typename Scanner&gt; 
  struct definition 
  { 
    boost::spirit::rule&lt;Scanner&gt; object, member, string, value, number, array; 

    definition(const json_grammar &amp;self) 
    { 
      using namespace boost::spirit; 
      object = "{" &gt;&gt; member &gt;&gt; *("," &gt;&gt; member) &gt;&gt; "}"; 
      member = string[print()] &gt;&gt; ":" &gt;&gt; value; 
      string = "\"" &gt;&gt; *~ch_p("\"") &gt;&gt; "\""; 
      value = string | number | object | array | str_p("true")[print()] | "false" | "null"; 
      number = real_p[print()]; 
      array = "[" &gt;&gt; value &gt;&gt; *("," &gt;&gt; value) &gt;&gt; "]"; 
    } 

    const boost::spirit::rule&lt;Scanner&gt; &amp;start() 
    { 
      return object; 
    } 
  }; 
}; 

int main(int argc, char *argv[]) 
{ 
  std::ifstream fs(argv[1]); 
  std::ostringstream ss; 
  ss &lt;&lt; fs.rdbuf(); 
  std::string data = ss.str(); 

  json_grammar g; 
  boost::spirit::parse_info&lt;&gt; pi = boost::spirit::parse(data.c_str(), g, boost::spirit::space_p); 
  if (pi.hit) 
  { 
    if (pi.full) 
      std::cout &lt;&lt; "parsing all data successfully" &lt;&lt; std::endl; 
    else 
      std::cout &lt;&lt; "parsing data partially" &lt;&lt; std::endl; 
    std::cout &lt;&lt; pi.length &lt;&lt; " characters parsed" &lt;&lt; std::endl; 
  } 
  else 
    std::cout &lt;&lt; "parsing failed; stopped at '" &lt;&lt; pi.stop &lt;&lt; "'" &lt;&lt; std::endl; 
} </pre>
<ul class="programlisting"><li><a class="programlisting" href="src/12.4.2/main.cpp">Download source code</a></li></ul>
<p>In addition, the example associates an action with <code class="classname">boost::spirit::real_p</code>. While most parsers pass a pointer to the beginning  and end of the identified data, <code class="classname">boost::spirit::real_p</code> passes the discovered number as <code class="type">double</code> instead. This certainly makes processing of numbers much easier since they do not need to be converted explicitly. In order to pass a value of type <code class="type">double</code> to the action, a corresponding overloaded  <code class="methodname">operator()()</code> operator has been added to <code class="classname">print</code>.</p>
<p>Besides the parsers introduced in this chapter such as  <code class="classname">boost::spirit::str_p</code> or <code class="classname">boost::spirit::real_p</code>, Boost.Spirit offers many more. For example, <code class="classname">boost::spirit::regex_p</code> is helpful if regular expressions should be used. In addition, parsers to verify conditions or execute loops exist as well. This helps creating dynamic parsers which process data differently depending on conditions. In order to get an overview over the utilities provided by Boost.Spirit, one should take a look at its documentation.</p>
</div>
<hr>
<h2 class="title">
<a name="parser_exercises"></a>12.5 Exercises</h2>
<div class="sect1">
<p class="solution">
              You can buy 
              <a target="_top" href="http://en.highscore.de/shop/index.php?p=boost-solution">solutions to all exercises</a>
              in this book as a ZIP file. 
            </p>
<ol><li class="listitem"><p>Develop a calculator that can add and subtract arbitrary integers and floating-point numbers. The calculator should accept input such as <strong class="userinput"><code>=-4+8 + 1.5</code></strong> and display <code class="computeroutput">5.5</code> as the result accordingly.</p></li></ol>
</div>
</div>
<hr class="hrfoot">
<p class="copyright">Copyright © 2008-2010 
        <a class="link" href="mailto:boris@highscore.de">Boris Schäling</a>
      </p>
</body>
</html>
